A few packages for spatial analysis and visualization in R

  1. ggmap: A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps and Stamen Maps). It includes tools common to those tasks, including functions for geolocation and routing.
  2. rgdal: The Geospatial Data Abstraction Library, access to projection/ transformation operations.
  3. GISTools: Mapping and spatial data manipulation tools – in particular, drawing choropleth maps.
  4. ggplot2: A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  5. ggsn
  6. geosphere: Spherical trigonometry for geographic applications. That is, compute distances and re- lated measures for angular (longitude/latitude) locations.
  7. gmapsdistance: Get distance and travel time between two points from Google Maps. Four possible modes of transportation (bicycling, walking, driving and public transportation).
#install.packages(c("ggmap","rgdal","GISTools","ggplot2","ggsn",geosphere","gmapsdistance"), repos="https://cran.rstudio.com")
library(ggmap)
library(rgdal)
library(GISTools)
library(ggplot2)
library(geosphere)
library(gmapsdistance)
library(ggsn)

Introduction to geocoding

In geospatial analyses, string addresses hold very little meaning just as in our everyday lives the precise latitude and longitude coordinates of our homes are not all that useful. That’s where geocoding comes in. The basis of geocoding is the following (respectfully copy-pasted from Google about their Google Maps Geocoding API):

Before we begin, here are a few “best practices” when preparing addresses to be geocoded (taken from the Harvard University Center for Geographic Analysis):

  1. Remove suite/ apartment numbers as they will simply create “ties”.
  2. Be mindful of special characters like @, #, ?, ;, -
  3. Buckle down and be prepared to spend a good long while “cleaning” addresses. I spent two days on my pharmacy addresses. I would highly recommend doing it with a quality latte(or perhaps something stronger) in-hand.

Example 1: City of Austin Public Art Collection

Background: This file contains the names and address of artworks commissioned through the Austin Art in Public Places Program. Established by the City in 1985, the Art in Public Places (AIPP) program collaborates with local & nationally-known artists to include the history and values of our community into cultural landmarks that have become cornerstones of Austin’s identity. The City of Austin was the first municipality in Texas to make a commitment to include works of art in construction projects (taken from here).

AustinPublicArt <- read.csv("https://query.data.world/s/93ujuqzxbfjkwnda3y2p8mgv9", header=TRUE, stringsAsFactors=FALSE)
colnames(AustinPublicArt) 
## [1] "artist_full_name"            "art_title"                  
## [3] "art_location_name"           "art_location_street_address"
## [5] "art_location_city"           "art_location_state"         
## [7] "art_location_zip"
AustinPublicArt$art_full_address <- paste(AustinPublicArt$art_location_street_address,AustinPublicArt$art_location_city,AustinPublicArt$art_location_state,AustinPublicArt$art_location_zip, sep = " ")
unique(AustinPublicArt$art_full_address) #let's see what we're working with here
##  [1] "Possum Point - North Bank at Shoal Creek; Austin TX 78701"                           
##  [2] "2800  Star of Texas Rd.; Austin TX 78719"                                            
##  [3] "2505 Steck Ave.; Austin TX 78757"                                                    
##  [4] "1143 Northwestern Ave.; Austin TX 78702"                                             
##  [5] "600 River Street; Austin TX 78701"                                                   
##  [6] "1163 Angelina St.; Austin TX 78702"                                                  
##  [7] "12500 Amherst Dr.; Austin TX 78727"                                                  
##  [8] "500 East Cesar Chavez St.; Austin TX 78701"                                          
##  [9] "7201 Levander Loop; Austin TX 78702"                                                 
## [10] "2400 Holly St.; Austin TX 78702"                                                     
## [11] "1105 E. Cesar Chavez Street; Austin TX 78702"                                        
## [12] "5803 Nuckols Crossing; Austin TX 78744"                                              
## [13] "3600 Presidential Blvd.; Austin TX 78719"                                            
## [14] "Webberville Road; Austin TX NA"                                                      
## [15] "808 Nile Rd.; Austin TX 78702"                                                       
## [16] "; Austin Texas 78702"                                                                
## [17] "Brush Square Park; 409 East Fifth Street Austin TX 78701"                            
## [18] "5th Street and Sabine Street; Austin TX 78701"                                       
## [19] "625 East 10th Street; Austin TX 78701"                                               
## [20] "7200 Berkman Drive; Austin TX NA"                                                    
## [21] "2220 Barton Springs Rd.; Austin TX 78746"                                            
## [22] "1161 Angelina St.; Austin TX 78702"                                                  
## [23] "8011 Beckett Rd.; Austin TX 78749"                                                   
## [24] "2500 Exposition Blvd.; Austin TX 78703"                                              
## [25] "2101 Jesse E. Segovia Street; Austin TX 78702"                                       
## [26] "West side between 6th and 7th Street; Austin TX NA"                                  
## [27] "3911 Manchaca Rd.; Austin TX 78704"                                                  
## [28] "300 N. Lamar Blvd.; Austin TX 78704"                                                 
## [29] "Cesar Chavez; East and West of S. First Street; Austin TX 78701"                     
## [30] "North Bank near Buford Fire Tower; Austin TX 78701"                                  
## [31] "400 Ralph Ablanedo Drive; Austin TX NA"                                              
## [32] "5801 Westminster Dr.; Austin TX 78723"                                               
## [33] "8637 Spicewood Springs Rd.; Austin TX 78759"                                         
## [34] "300 Cesar Chavez Street; Austin TX NA"                                               
## [35] "5125 Convict Hill Rd.; Austin TX 78749"                                              
## [36] "900 Barton Springs Road; Austin TX NA"                                               
## [37] "2200 Hancock Dr.; Austin TX 78756"                                                   
## [38] "601 E 15th St.; 9th Floor Austin TX 78701"                                           
## [39] "Corner of Blessing and Wheatly Ave.; Austin TX NA"                                   
## [40] "Festival Beach Park along the; Nash Hernandez Sr. Road Austin TX 78702"              
## [41] "300 South Congress Avenue; Austin TX 78704"                                          
## [42] "7201 Colony Loop Drive; Austin TX 78724"                                             
## [43] "5801 Ainez Dr.; Austin TX 78744"                                                     
## [44] "2nd Street & San Antonio; Austin TX 78701"                                           
## [45] "7800 Johnny Morris Road; Austin TX 78724"                                            
## [46] "4108 Todd Lane; Austin TX NA"                                                        
## [47] "2608 Gonzales St.; Austin TX 78702"                                                  
## [48] "1106 E. Rundberg Ln.; Austin TX 78753"                                               
## [49] "; Austin Texas 78701"                                                                
## [50] "3635 RR 620 South; Austin TX 78738"                                                  
## [51] "2201 Barton Springs Rd.; Austin TX 78746"                                            
## [52] "12400 Lamplight Village Ave.; Austin TX 78727"                                       
## [53] "301 Nature Center Dr.; Austin TX 78746"                                              
## [54] "34 Robert T. Martinez Jr. St.; Austin TX NA"                                         
## [55] "5010 Old Manor Rd; Austin TX 78723"                                                  
## [56] "4411 Meinardus Road; Austin TX 78744"                                                
## [57] "2100 E. 3rd St.; Austin TX 78702"                                                    
## [58] "3701 Grooms Street; Austin TX 78705"                                                 
## [59] "812 Springdale Road; Austin TX 78702"                                                
## [60] "5811 Nuckols Crossing Rd.; Austin TX 78744"                                          
## [61] "South Bank at South 1st &  Riverside Drive; Austin TX 78704"                         
## [62] "409 East Fifth St.; Austin TX 78701"                                                 
## [63] "651 North Pleasant Valley  Rd.; Austin TX 78702"                                     
## [64] "8601 Tallwood Dr.; Austin TX 78759"                                                  
## [65] "1201 Webberville Rd.; Austin TX 78721"                                               
## [66] "Veterans Drive; Austin TX NA"                                                        
## [67] "Veterans Drive to Shady Lane; Austin TX NA"                                          
## [68] "South Bank at Barton Creek; Austin TX 78704"                                         
## [69] "3810 Todd Lane; Austin TX NA"                                                        
## [70] "1600 Grove Blvd; Austin TX 78704"                                                    
## [71] "124 West Eighth St.; 3rd Floor; Austin TX 78701"                                     
## [72] "1156 W. Cesar Chavez St.; Austin TX 78703"                                           
## [73] "; Austin TX 78701"                                                                   
## [74] "5400 Jimmy Clay Dr.; Austin TX 78744"                                                
## [75] "1156 Hargrave St.; Austin TX 78702"                                                  
## [76] "11401 Escarpment Blvd.; Austin TX 78749"                                             
## [77] "2201 Barton Springs Rd.; Austin TX 78704"                                            
## [78] "4123 South First St.; Austin TX 78745"                                               
## [79] "1200 Montopolis Drive; Austin TX NA"                                                 
## [80] "Roy Montelongo Scenic Overlook; Canterbury St and Pleasant Valley Rd Austin TX 78702"

Keys for cleaning addresses: all hail sub() and gsub()

If you just want to get rid of the first occurrence of something, use:

  • Use sub(“What you don’t want”, “What you do want”, “Where you want it”)

If you want to get rid of every occurrence of something, use:

  • Use gsub(“What you don’t want”, “What you do want”, “Where you want it”)
AustinPublicArt$art_location_street_address <- gsub(";", "", AustinPublicArt$art_location_street_address) #get rid of special character ";" at the end of each address
AustinPublicArt$art_location_street_address <- sub(".* - ", "", AustinPublicArt$art_location_street_address) #get rid of "Possum Point - " at the beginning

Many geocoders exist today (both open-source and proprietary), but the basis is the same. The algorithm begins by taking a complete street address and breaking it down into its component parts. For example, if we wanted to geocode the Vanderbilt Biostatistics Department we would begin with

The geocoder then works its way down the geographic hierarchy that you specify (depending on the scope of your data) to find increasingly more precise coordinates based on matching the address’s:

  1. State
  2. City
  3. Zip code
  4. Street address

Fortunately, the R community has blessed us with yet another package: ggmap! ggmap contains a quick function called geocode() – tricky to remember, I know – that will take in a string address and output the latitude and longitude coordinates utilizing the Google Maps API or the Data Science Toolkit online (the choice is yours).

geocode("2525 West End Avenue Nashville TN 37203", source="dsk") #source from Data Science Toolkit
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=2525%20West%20End%20Avenue%20Nashville%20TN%2037203&sensor=false
##         lon      lat
## 1 -86.80882 36.14664
geocode("2525 West End Avenue Nashville TN 37203", source="google") #source from Google Maps
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=2525%20West%20End%20Avenue%20Nashville%20TN%2037203&sensor=false
##         lon      lat
## 1 -86.80932 36.14601

We can see that the geocoded addresses from each source are quite close. I have not (yet) found much information differentiating between the two, so for now I choose to proceed with Google Maps. However, keep in mind that this free service limits you to 2500 queries per day (so make them count!). As I’m writing this tutorial, though, I’ve realized that the queries run as a R Markdown file knits do not seem to count…

AustinLatLong <- geocode(AustinPublicArt$art_full_address, source="google")

Now, the unfortunate truth is that we will rarely be able to successfully geocode an entire set of addresses due to either unforgiveable type-os or missing data. In our case, we were unable to precisely pinpoint the location of 10.1\(\%\) of the works of art. This calculation comes from:

mean(is.na(AustinLatLong))
## [1] 0.1011905

Based on the histograms of the geocoded latitudes and longitudes, there was reason to limit our observations to those with coordinates clustered around the majority. Recall that we are focusing only on the Austin, Texas urban area, so a work of art that is supposedly located more than 1 degree of latitude or longitude away from the median is unrealistic. The rule of thumb is that one degree of latitude is approximately 69 miles, which motivates this exclusion.

Additionally, since we are seeking only to describe the dispersion of public works of art across Austin, we can focus on the subset that were successfully geocoded (i.e. the latitude and longitude are not NA).

AustinPublicArt$art_location_long <- AustinLatLong[,1]
AustinPublicArt$art_location_lat <- AustinLatLong[,2]
AustinPublicArt <- subset(AustinPublicArt, !is.na(AustinPublicArt$art_location_long))
AustinPublicArt <- subset(AustinPublicArt, AustinPublicArt$art_location_lat < 32)
AustinPublicArt <- subset(AustinPublicArt, AustinPublicArt$art_location_long < -96)

These exclusions leave us with a subset of 149 works of art.

Now that we have geocoded our addresses, we are ready to build our first map!

It all starts with a shapefile

Now that we have data that we’re interested in visualizing on a map, we need a blank canvas. When mapping in R using GISTools, this blank canvas is called a “shapefile.” This is the same file type used for Geographic Information Systems (GIS) software, so I prefer it because there is an abundance of data available in this format. The first place I look for maps of the United States is the U.S. Census Bureau.

Using the readOGR() function in the rgdal package, we can read in the shapefile we’ve downloaded as a usable spatial layer.

setwd("~/Dropbox/R Ladies")
#setwd("C:/Users/DC01336/Documents")
library(rgdal)
UrbanAreasUS <- readOGR(dsn = path.expand("tl_2016_us_uac10/tl_2016_us_uac10.shp"), layer="tl_2016_us_uac10")
## OGR data source with driver: ESRI Shapefile 
## Source: "tl_2016_us_uac10/tl_2016_us_uac10.shp", layer: "tl_2016_us_uac10"
## with 3601 features
## It has 12 fields
## Integer64 fields read as strings:  ALAND10 AWATER10
class(UrbanAreasUS)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

What the heck is a “Spatial Polygons Data Frame”?

Classes of spatial data

  1. SpatialPoints: simply describe locations (with no attributes)
  2. SpatialPointsDataFrame: describes locations + attributes for them
  3. SpatialPolygonsDataFrame: describes locations + shapes + attributes for them

We can access the familiar data frame component of the UrbanAreas_US shapefile as follows:

head(UrbanAreasUS@data) #or alternatively head(data.frame(UrbanAreasUS))
##   UACE10 GEOID10           NAME10                     NAMELSAD10 LSAD10
## 0  24310   24310        Dixon, IL        Dixon, IL Urban Cluster     76
## 1  27847   27847     Escanaba, MI     Escanaba, MI Urban Cluster     76
## 2  18100   18100 Clintonville, WI Clintonville, WI Urban Cluster     76
## 3  06166   06166      Bedford, IN      Bedford, IN Urban Cluster     76
## 4  75270   75270    Riverdale, CA    Riverdale, CA Urban Cluster     76
## 5  90946   90946      Visalia, CA     Visalia, CA Urbanized Area     75
##   MTFCC10 UATYP10 FUNCSTAT10   ALAND10 AWATER10  INTPTLAT10   INTPTLON10
## 0   G3500       C          S  25524689   938058 +41.8529507 -089.4817439
## 1   G3500       C          S  46488558   283456 +45.7274565 -087.0824457
## 2   G3500       C          S   5854721   502397 +44.6232203 -088.7611283
## 3   G3500       C          S  30403132     2314 +38.8566530 -086.5012383
## 4   G3500       C          S   2306823        0 +36.4310710 -119.8620544
## 5   G3500       U          S 164291020    39512 +36.2934877 -119.3006857

By taking advantage of this data frame, we can focus our map on the Austin, TX urban area using our usual data subsetting techniques. Be careful not to select a subset of only the data frame, because if you lose the “spatial polygon”" part R will not plot your map properly (i.e. as a map).

Choose your adventure: GISTools or ggplot?

To plot this shapefile using GISTools, our work is basically done.

UrbanAreasUS@data[which(UrbanAreasUS@data$NAME10 == "Austin, TX"),] #find the unique identifier for Austin, TX
##      UACE10 GEOID10     NAME10                NAMELSAD10 LSAD10 MTFCC10
## 3567  04384   04384 Austin, TX Austin, TX Urbanized Area     75   G3500
##      UATYP10 FUNCSTAT10    ALAND10 AWATER10  INTPTLAT10   INTPTLON10
## 3567       U          S 1356225919 11012063 +30.3585141 -097.7615330
AustinTX <- subset(UrbanAreasUS, UrbanAreasUS@data$NAME10 == "Austin, TX")
par(mar=c(1,1,1,1))
plot(AustinTX)

Now that we have a basic outline of the Austin, TX urban area we can dive into the fun stuff: customization! Feel free to play around with the following aesthetics within your plot() function:

par(mar=c(1,1,1,1))
plot(AustinTX, asp = 1, col="darkorange3", bg="wheat", lty=1, main = "Austin, Texas Urban Area")
# locator()
maps::map.scale(-97.49575, 30.06154, metric=FALSE, ratio = FALSE)
library(GISTools)
north.arrow(-98.11653, 30.64451, len=1/40, col="white")

To map Austin in ggplot instead we need to use the fortify() function to convert the spatial polygon data frame to a traditional data frame. For us to be able to subset the exhaustive U.S. Urban Areas to Austin, TX we need to use the region input to keep the “NAME10” unique identifiers.

UrbanAreasUS.df <- fortify(UrbanAreasUS, region = "NAME10") #convert spatial polygon data frame --> data frame
AustinTX.df <- subset(UrbanAreasUS.df, UrbanAreasUS.df$id == "Austin, TX")

Once the shapefile has been formatted appropriately, we can utilize our knowledge of the “grammar of graphics” to map the urban area for Austin, TX once more. We will need the additional ggsn package to add north arrows and map scales for orientation.

AustinTXMap <- ggplot() + geom_polygon(data = AustinTX.df, aes(x = long, y = lat, group=group), fill = "orange") + ggtitle("Ausin, TX Urban Area") + north(AustinTX.df, location = "bottomright")
AustinTXMap

Map of the City of Austin’s Public Art Collection

First, we need to create a new Spatial Points Data Frame so that R knows to acknowledge the geocoded latitude and longitude that we found as coordinates.

AustinPublicArt_SP <- SpatialPointsDataFrame(coords = AustinPublicArt[,9:10], data = AustinPublicArt[,1:8])

Now we can begin with our base map from above, and overlay it with the locations of works of street art! When plotting spatial point data, we have a new selection of aesthetics to play with:

library(GISTools)
#recreate our beautiful base map of Austin from above
par(mar=c(1,1,1,1))
plot(AustinTX, asp = 1, col="white",bg="wheat", lty=1, main = "City of Austin Public Art Collection")
maps::map.scale(-97.49575, 30.06154, metric=FALSE, ratio = FALSE)
north.arrow(-98.11653, 30.64451, len=1/40, col="darkorange1")
plot(AustinPublicArt_SP, add=TRUE, col="darkorange3", lwd=1.5)

From the map above, we should be able to make some preliminary observations about the spatial distribution of public artwork across the urban area for the city of Austin, Texas. My primary observation was a large cluster of pieces toward the center of the area. Another way to inspect potential “clustering” in the locations of public art would be with a Kernel Density Estimate Map.

Kernal Density Estimation Maps

(I thought those were called heatmaps? So did I.)

Map of the density of public artwork across Austin, Texas

KDE maps are shaded according to the kernel density estimate of a given value or attribute across space. In our case, we are interested in the spatial distribution of places of worship by religion in the area of interest.

library(GISTools)
ArtDensity <- kde.points(AustinPublicArt_SP, lims=AustinTX)
par(mar=c(1,1,1,1))
level.plot(ArtDensity)
plot(AustinTX, add=TRUE)
title("Spatial distribution of public art across Austin, TX")

Perhaps policy makers requested this analysis so that they could assess how the City of Austin Public Arts Collection is bringing enrichment to the people of Austin and if the collection is accessible to everyone. How would we answer their question?

How about estimating the average distance between works of art as a gauge of availability/ accessibility.

Calculating distances between places on a map

When we think about distances traveled, how do we measure them?

  1. Distance (shortest route possible between two points)
    1. Haversine Formula: calculates the “great-circle” distance between two points (i.e. “as the crow flies”). Performs well even at short distances. \[d(i,j) = 2r\text{arcsin}(\text{sin}^2(\frac{\text{lat}_j-\text{lat}_i}{2}) + \text{cos}(\text{lat}_i)\text{cos}(\text{lat}_j)\text{sin}^2(\frac{\text{long}_j - \text{long}_i}{2}))\]
    2. Spherical Law of Cosines: gives reliable estimates down to a few meters. Formula is simpler than Haversine Formula. \[d(i,j) = r \cdot \text{arccos}(\text{sin}(\text{lat_i})\text{sin}(\text{lat}_j) + \text{cos}(\text{lat}_i)\text{cos}(\text{lat}_j)\text{cos}(\text{long}_j-\text{long}_j))\]
    3. Where \(r = 3959\), the radius of the Earth in miles. If we were interested in calculating distances in kilometers or some other unit, we would replace \(r\) with the appropriate conversion.
  2. Distance (incorporating transportation networks to measure true route between two points)
  3. Travel time (incorporating transportation networks to measure true route between two points)

Which of these measures you choose might depend on the nature or scope of your project, but since our data set is reasonably manageable I thought we could experiment with all three. First, you need to quickly get your unique API key from Google.

Using the geosphere package for Haversine Formula and Spherical Law of Cosines

The geosphere package contains functions disthaversine() and distcosine() to calculate the distance between either two vectors of latitude/ longitude coordinates or matrices with two columns for latitude/ longitude, using the Haversine Formula and Spherical Law of Cosines, respectively.

#set.api.key("MYKEY")
library(geosphere) #contains functions for Haversine Formula and Law of Cosines

#Use Haversine Formula

#Input: lat/long coordinates of the 1st and 2nd works of art
#Output: distance between 1st and 2nd works of art (in miles)
distHaversine(AustinPublicArt_SP@coords[1,],  AustinPublicArt_SP@coords[2,], r = 3959)
## [1] 8.013604
#Input: lat/long coordinates of the 1st vs. all other works of art
#Output: distances between the 1st and all other works of art (in miles)
distHaversine(AustinPublicArt_SP@coords[1,], AustinPublicArt_SP@coords[-1,], r = 3959)
##   [1]  8.013604  5.416017  4.016146  4.931046 12.040386  4.012327  8.013604
##   [8]  7.000807  5.344003  7.000807  7.000807  4.390746  0.000000  6.682523
##  [15]  9.458376  7.000807  4.016146  0.000000  7.059346  5.697516  5.514135
##  [22]  4.034459  4.189114  7.000807  4.012327  4.296921  4.012327  9.458376
##  [29]  2.104308  4.933840  5.344003  4.970950  3.197082  5.025598  5.315432
##  [36]  2.650919  5.697516  0.000000  9.458376  3.067108  5.344003  3.711550
##  [43]  3.676443  3.197082  0.000000  2.650919  7.618972  0.000000 11.721593
##  [50]  9.458376  5.344003  5.416017  4.448432  3.230634  5.561036  4.381507
##  [57]  8.229934  3.230634  4.012327  4.012327  4.381507  3.608513 10.402236
##  [64]  9.458376  4.016146  6.809278  7.000807  0.000000  5.561036  9.458376
##  [71]  4.390746 10.509772  9.458376  0.000000  5.686198  0.000000  9.614367
##  [78]  3.740396  3.740396  2.153834  2.104308 12.079088  4.933840  9.458376
##  [85]  5.344003  1.928873  7.243444  5.344003  5.046853  5.344003  8.229934
##  [92]  9.458376  4.970950  7.988015 12.079088  1.928873  5.158378  5.344003
##  [99]  4.976240  5.344003  6.369227  3.740396  6.692322  4.381507  4.034459
## [106]  6.809278  5.344003  4.390746  5.697516  4.028166  5.856212  8.354896
## [113]  7.351420  8.229934  0.000000  1.575877  6.369227  4.012327  6.464308
## [120]  3.810005  6.682523  5.344003  2.833547  0.000000  6.369227  5.416752
## [127]  5.416752  8.229934  1.575877  3.740396  5.344003  7.000807  7.000807
## [134]  5.416017  9.458376  7.246306  5.725588  9.458376  7.928910  5.514135
## [141]  1.928873  6.682523  2.153834  3.623156  3.740396  4.296921  6.800969
## [148]  5.344003

We can take the mean of this last command’s output to calculate the average distance to other works of art for each piece (our proposed gauge of accessability). To find the average distance from each of the 149 works of art, we can construct a distance matrix using the distm() function in geosphere() and then take the colmeans() of that matrix.

DistanceMatrix_Haversine <- distm(data.frame(AustinPublicArt_SP@coords), data.frame(AustinPublicArt_SP@coords), distHaversine)*0.000621371
DistanceMatrix_Haversine[1,]
##   [1]  0.000000  8.022084  5.421749  4.020396  4.936264 12.053127  4.016573
##   [8]  8.022084  7.008216  5.349658  7.008216  7.008216  4.395392  0.000000
##  [15]  6.689594  9.468385  7.008216  4.020396  0.000000  7.066816  5.703545
##  [22]  5.519970  4.038728  4.193547  7.008216  4.016573  4.301468  4.016573
##  [29]  9.468385  2.106535  4.939061  5.349658  4.976210  3.200465  5.030916
##  [36]  5.321057  2.653724  5.703545  0.000000  9.468385  3.070353  5.349658
##  [43]  3.715477  3.680333  3.200465  0.000000  2.653724  7.627035  0.000000
##  [50] 11.733996  9.468385  5.349658  5.421749  4.453139  3.234052  5.566920
##  [57]  4.386143  8.238643  3.234052  4.016573  4.016573  4.386143  3.612331
##  [64] 10.413243  9.468385  4.020396  6.816483  7.008216  0.000000  5.566920
##  [71]  9.468385  4.395392 10.520894  9.468385  0.000000  5.692215  0.000000
##  [78]  9.624541  3.744354  3.744354  2.156113  2.106535 12.091869  4.939061
##  [85]  9.468385  5.349658  1.930915  7.251109  5.349658  5.052194  5.349658
##  [92]  8.238643  9.468385  4.976210  7.996468 12.091869  1.930915  5.163837
##  [99]  5.349658  4.981506  5.349658  6.375967  3.744354  6.699403  4.386143
## [106]  4.038728  6.816483  5.349658  4.395392  5.703545  4.032429  5.862409
## [113]  8.363737  7.359199  8.238643  0.000000  1.577544  6.375967  4.016573
## [120]  6.471148  3.814037  6.689594  5.349658  2.836545  0.000000  6.375967
## [127]  5.422484  5.422484  8.238643  1.577544  3.744354  5.349658  7.008216
## [134]  7.008216  5.421749  9.468385  7.253973  5.731647  9.468385  7.937300
## [141]  5.519970  1.930915  6.689594  2.156113  3.626990  3.744354  4.301468
## [148]  6.808166  5.349658
diag(DistanceMatrix_Haversine) <- NA #remove 0 values for distance b/t a work of art and itself
AustinPublicArt_SP@data$avg_distance_haversine <- colMeans(DistanceMatrix_Haversine, na.rm=TRUE)

Just for fun, let’s look at the histogram of average distance from each piece of art to the rest of the collection.

The inputs for the distCosine() function are the same, and the calculations will be similar for most reasonable distances. Let’s quickly repeat the steps above, but now using the Spherical Law of Cosines.

library(geosphere)

#Use Spherical Law of Cosines

#Input: lat/long coordinates of the 1st and 2nd works of art
#Output: distance between 1st and 2nd works of art (in miles)
distCosine(AustinPublicArt_SP@coords[1,],  AustinPublicArt_SP@coords[2,], r = 3959)
## [1] 8.013604
#Input: lat/long coordinates of the 1st vs. all other works of art
#Output: distances between the 1st and all other works of art (in miles)
distCosine(AustinPublicArt_SP@coords[1,], AustinPublicArt_SP@coords[-1,], r = 3959)
##   [1]  8.013604  5.416017  4.016146  4.931046 12.040386  4.012327  8.013604
##   [8]  7.000807  5.344003  7.000807  7.000807  4.390746  0.000000  6.682523
##  [15]  9.458376  7.000807  4.016146  0.000000  7.059346  5.697516  5.514135
##  [22]  4.034459  4.189114  7.000807  4.012327  4.296921  4.012327  9.458376
##  [29]  2.104308  4.933840  5.344003  4.970950  3.197082  5.025598  5.315432
##  [36]  2.650919  5.697516  0.000000  9.458376  3.067108  5.344003  3.711550
##  [43]  3.676443  3.197082  0.000000  2.650919  7.618972  0.000000 11.721593
##  [50]  9.458376  5.344003  5.416017  4.448432  3.230634  5.561036  4.381507
##  [57]  8.229934  3.230634  4.012327  4.012327  4.381507  3.608513 10.402236
##  [64]  9.458376  4.016146  6.809278  7.000807  0.000000  5.561036  9.458376
##  [71]  4.390746 10.509772  9.458376  0.000000  5.686198  0.000000  9.614367
##  [78]  3.740396  3.740396  2.153834  2.104308 12.079088  4.933840  9.458376
##  [85]  5.344003  1.928873  7.243444  5.344003  5.046853  5.344003  8.229934
##  [92]  9.458376  4.970950  7.988015 12.079088  1.928873  5.158378  5.344003
##  [99]  4.976240  5.344003  6.369227  3.740396  6.692322  4.381507  4.034459
## [106]  6.809278  5.344003  4.390746  5.697516  4.028166  5.856212  8.354896
## [113]  7.351420  8.229934  0.000000  1.575877  6.369227  4.012327  6.464308
## [120]  3.810005  6.682523  5.344003  2.833547  0.000000  6.369227  5.416752
## [127]  5.416752  8.229934  1.575877  3.740396  5.344003  7.000807  7.000807
## [134]  5.416017  9.458376  7.246306  5.725588  9.458376  7.928910  5.514135
## [141]  1.928873  6.682523  2.153834  3.623156  3.740396  4.296921  6.800969
## [148]  5.344003
DistanceMatrix_Cosine <- distm(data.frame(AustinPublicArt_SP@coords), data.frame(AustinPublicArt_SP@coords), distCosine)*0.000621371
DistanceMatrix_Cosine[1,]
##   [1]  0.000000  8.022084  5.421749  4.020396  4.936264 12.053127  4.016573
##   [8]  8.022084  7.008216  5.349658  7.008216  7.008216  4.395392  0.000000
##  [15]  6.689594  9.468385  7.008216  4.020396  0.000000  7.066816  5.703545
##  [22]  5.519970  4.038728  4.193547  7.008216  4.016573  4.301468  4.016573
##  [29]  9.468385  2.106535  4.939061  5.349658  4.976210  3.200465  5.030916
##  [36]  5.321057  2.653724  5.703545  0.000000  9.468385  3.070353  5.349658
##  [43]  3.715477  3.680333  3.200465  0.000000  2.653724  7.627035  0.000000
##  [50] 11.733996  9.468385  5.349658  5.421749  4.453139  3.234052  5.566920
##  [57]  4.386143  8.238643  3.234052  4.016573  4.016573  4.386143  3.612331
##  [64] 10.413243  9.468385  4.020396  6.816483  7.008216  0.000000  5.566920
##  [71]  9.468385  4.395392 10.520894  9.468385  0.000000  5.692215  0.000000
##  [78]  9.624541  3.744354  3.744354  2.156113  2.106535 12.091869  4.939061
##  [85]  9.468385  5.349658  1.930915  7.251109  5.349658  5.052194  5.349658
##  [92]  8.238643  9.468385  4.976210  7.996468 12.091869  1.930915  5.163837
##  [99]  5.349658  4.981506  5.349658  6.375967  3.744354  6.699403  4.386143
## [106]  4.038728  6.816483  5.349658  4.395392  5.703545  4.032429  5.862409
## [113]  8.363737  7.359199  8.238643  0.000000  1.577544  6.375967  4.016573
## [120]  6.471148  3.814037  6.689594  5.349658  2.836545  0.000000  6.375967
## [127]  5.422484  5.422484  8.238643  1.577544  3.744354  5.349658  7.008216
## [134]  7.008216  5.421749  9.468385  7.253973  5.731647  9.468385  7.937300
## [141]  5.519970  1.930915  6.689594  2.156113  3.626990  3.744354  4.301468
## [148]  6.808166  5.349658
diag(DistanceMatrix_Cosine) <- NA #remove 0 values for distance b/t a work of art and itself
AustinPublicArt_SP@data$avg_distance_cosine <- colMeans(DistanceMatrix_Cosine, na.rm=TRUE)

We can now compare the performances of the Haversine Formula and Spherical Law of Cosines.

Using the gmapsdistance package for travel times and distances based on road networks

Depending on the focus of the spatial analysis, it may be more prudent to incorporate networks of roads and sidewalks to calculate distances or travel times directly rather than “as the crow flies” (the shortest possible distance between two points). With the gmapsdistance package this can be accomplished using the Google Maps API directly in R to utilize Google Maps data to calculate travel distances (output in meters) and time (output in seconds) by a variety of modes (walking, biking, driving, or taking public transportation), at different times of day, and under different traffic conditions (“pessimistic”, “optimistic”, “best guess”).

The inputs for the gmapsdistance() function are a little different. With this command, the origin and destination can either be named places or pairs of latitude and longitude coordinates of the forms “CITY+STATE” or “LATITUDE+LONGITUDE”, respectively. To simplify the code, I elected to use the paste() function to input coordinates in this format without altering the original spatial polygons data frame.

AustinPublicArt_SP@coords[1,] #access one pair of lat/long coordinates from the spatial polygons data frame
## art_location_long  art_location_lat 
##         -97.80663          30.26656
paste(AustinPublicArt_SP@coords[1,2],AustinPublicArt_SP@coords[1,1], sep="+") #reformat this pair to be of the form "lat+long"
## [1] "30.2665587+-97.8066293"
library(gmapsdistance) #utilizes Google Maps data to calculate travel distances or times by bicycle, walking, driving, or public transportation

Example 2: Places of Worship

Background: The dataset contains locations and attributes of Places of Worship, created as part of the DC Geographic Information System (DC GIS) for the D.C. Office of the Chief Technology Officer (OCTO) and participating D.C. government agencies. Information provided by various sources identified Places of Worship such as churches and faith based organizations . DC GIS staff geo-processed this data from a variety of sources.

PlacesOfWorship <- read.csv("https://query.data.world/s/enqqu329pdhaq78usawjway3a", header=TRUE, stringsAsFactors=FALSE)
colnames(PlacesOfWorship)
## [1] "objectid"  "gis_id"    "name"      "religion"  "address"   "latitude" 
## [7] "longitude"
PlacesOfWorship_SP <- SpatialPointsDataFrame(coords = PlacesOfWorship[,6:7], data = PlacesOfWorship[,1:5])

head(PlacesOfWorship_SP@data) #view attributes available
##   objectid  gis_id                                name  religion
## 1        1   pow_1   Sixth Church of Christ, Scientist CHRISTIAN
## 2        2   pow_2 Seventh Church of Christ, Scientist CHRISTIAN
## 3       11  pow_12         Christian Tabernacle Church CHRISTIAN
## 4      101 pow_152  Ohev Sholom The National Synagogue    JEWISH
## 5      102 pow_153          Palisades Community Church CHRISTIAN
## 6      103 pow_154                   Paramount Baptist CHRISTIAN
##                        address
## 1 4601 MASSACHUSETTS AVENUE NW
## 2           1204 47TH PLACE NE
## 3          2033 11TH STREET NW
## 4       1600 JONQUIL STREET NW
## 5     5200 CATHEDRAL AVENUE NW
## 6           3924 4TH STREET SE

These data are extra neat because for each house of worship we have two kinds of geographic information:

  1. Address (in String form, where you would send mail)
  2. Geocoded (pinpointed down to a single latitude/longitude pair)

This saves us the preliminary geocoding step. Instead, we can begin by repeating the subset process to get our base layer shape for the state of Washington, D.C.. We can find this outline by downloading the States (and equivalent) shapefile from the United States Census Bureau. Helpful tidbit: the FIPS code for the District of Columbia is 11. By using the FIPS code instead of trying to use a string “District of Columbia” we can be careful about capitalizations or abbreviations.

Using GISTools, we can plot this shapefile with the following code:

States <- readOGR(path.expand("tl_2016_us_state/tl_2016_us_state.shp"), "tl_2016_us_state")
## OGR data source with driver: ESRI Shapefile 
## Source: "tl_2016_us_state/tl_2016_us_state.shp", layer: "tl_2016_us_state"
## with 56 features
## It has 14 fields
## Integer64 fields read as strings:  ALAND AWATER
WashingtonDC_shape <- subset(States, States@data$STATEFP == 11)
par(mar=c(0,0,0,0))
plot(WashingtonDC_shape, col="cornflowerblue")

In ggplot2, we can plot alternatively plot it using either of the following commands. The first repeats the steps from the Austin, Texas example to fortify the shapefile and then plot the latitude and longitude coordinates as a polygon using geom_polygon(). The second is new! Within ggplot2, there is a command called map_data() that allows you to directly read in common map shapes. We couldn’t use this before because we were focusing on a small area, but now would be a fun time to explore it to plot Washington D.C.!

WashingtonDC_df <- fortify(WashingtonDC_shape, region = "NAME")
ggplot() + geom_polygon(data = WashingtonDC_df, aes(x = long, y = lat, group = group), fill = "cornflowerblue") + ggtitle("Washington D.C. (from US Census")

WashingtonDC_gg <- subset(map_data("state"), map_data("state")$region == "district of columbia")
ggplot() + geom_polygon(data = WashingtonDC_gg, aes(x = long, y = lat, group = region), fill="cornflowerblue") + ggtitle("Washington D.C. (from ggplot)")

Personally, I prefer the precision of the shapefile to that of the map_data() plot. But for maps of larger areas (i.e. all of the states) this is definitely a great option!

Now that we know the building blocks of a basic map, let’s mix things up a bit by mapping data points with an attribute from the Places of Worship data set.

Mapping location and attributes together

To illustrate incorporating attributes with spatial data, we will be creating a map of the places of worship in Washington, D.C. where the religion of each location can be identified from the unique symbol/ color combo.

To assign our unique symbols to each religion, we are going to add a “Symbol” and “Color” column to the data frame portion of the Places of Worship spatial polygon data frame.

Religions <- unique(PlacesOfWorship$religion)
Colors <- c("red", "orange", "yellow", "green", "blue")
Symbols <- c(15, 16, 17, 18, 20)

for (i in 1:length(Religions))
{
  PlacesOfWorship_SP@data$Symbol[which(PlacesOfWorship_SP@data$religion == Religions[i])] <- Symbols[i]
  PlacesOfWorship_SP@data$Colors[which(PlacesOfWorship_SP@data$religion == Religions[i])] <- Colors[i]
}
par(mar=c(1,1,1,1))
plot(WashingtonDC_shape, asp=1, col="white", bg="wheat", lty=1, main="Places of Worship by Denomination (Washington, D.C.)")
# maps::map.scale(-76.99345, 38.5940, metric=FALSE, ratio = FALSE)
# north.arrow(-77.76864, 39.348, len=1/40, col="red")
plot(PlacesOfWorship_SP, add=TRUE, col=PlacesOfWorship_SP@data$Color, lwd=1, pch=PlacesOfWorship_SP@data$Symbol)
legend(x = "bottomleft", pch=Symbols, col=Colors, legend=Religions)

With an abundance of Christian churches, it’s difficult to see the distribution of other religions.

ftable(PlacesOfWorship_SP@data$religion)
##  BUDDHIST CHRISTIAN JEWISH MUSLIM SCIENTOLOGY
##                                              
##         5       813     12     14           1

From this quick frequency table, we can see that an overwhelming 0.96\(\%\) of the houses of worship in the data set are Christian. Perhaps it would be more insightful to view houses of worship belonging to all other religions on this map individually, and then to look at the spatial distribution of only Christian locations.

par(mar=c(1,1,1,1))
plot(WashingtonDC_shape, asp=1, col="white", bg="wheat", lty=1, main="Places of Worship by Denomination (Washington, D.C.)")
# maps::map.scale(-76.99345, 38.5940, metric=FALSE, ratio = FALSE)
# north.arrow(-77.76864, 39.348, len=1/40, col="red")
plot(subset(PlacesOfWorship_SP, PlacesOfWorship_SP@data$religion != "CHRISTIAN"), add=TRUE, col=subset(PlacesOfWorship_SP, PlacesOfWorship_SP@data$religion != "CHRISTIAN")$Color, lwd=3, pch=subset(PlacesOfWorship_SP, PlacesOfWorship_SP@data$religion != "CHRISTIAN")$Symbol, sub = "Excluding Christian")
legend(x = "bottomleft", pch=Symbols[2:5], col=Colors[2:5], legend=Religions[2:5])

Getting colorful with choropleth maps

A choropleth is a map that shades geographic areas (counties, census blocks, urban areas, etc.) by their relative proportion of a given attribute. Within Washington D.C., we can use a choropleth to examine the relative proportions of Christian houses of worship located in each census block group. To do so, we will be needing the [District of Columbia Block Group shapefile] (https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2016&layergroup=Block+Groups).

CensusBlocks_WashingtonDC <- readOGR(path.expand("tl_2016_11_bg/tl_2016_11_bg.shp"), "tl_2016_11_bg")
## OGR data source with driver: ESRI Shapefile 
## Source: "tl_2016_11_bg/tl_2016_11_bg.shp", layer: "tl_2016_11_bg"
## with 450 features
## It has 12 fields
## Integer64 fields read as strings:  ALAND AWATER

For R to create the shading for our choropleth map, we need to help it out by tabulating the number of Christian houses of worship in each census block group. We can do this pretty simply by using the poly.count() function, which will count the number of spatial points (from the Christian subset of Places of Worship) that fall inside of each spatial polygon (the shapefile of census block groups).

CensusBlocks_WashingtonDC@data$CountChristian <- c(poly.counts(subset(PlacesOfWorship_SP, PlacesOfWorship_SP@data$religion == "CHRISTIAN"), CensusBlocks_WashingtonDC))

Now that we have added a “CountChristian” column that tabulates the number of Christian places of worship contained within each census block group we are ready to create our choropleth.

par(mar=c(1,1,1,1))
plot(WashingtonDC_shape, asp=1, col="white", bg="wheat", lty=1)
choropleth(CensusBlocks_WashingtonDC, CensusBlocks_WashingtonDC@data$CountChristian, add=TRUE)
title(main="Christian Houses of Worship by Census Block Group")

So pretty! But what do the colors mean? Before we add a legend to our beautiful choropleth, we need to talk about shading and color schemes. The choropleth() function is taking the range of counts (from 0 to the most Christian churches observed in a single census block group) and creating five categories. We can create our own custom color scheme using the auto.shading() command with brewer.pal() inside of it. brewer.pal() will generate color palettes using the R ColorBrewer package based on a named palette input. Some examples of palettes to chose from include classics like “Reds” (the current default for this choropleth), “Oranges”, “Yellows”, “Greens”, “Blues”, “Purples”, and “Pinks” as well as “Accent”, “Pastel1”, and “Pastel2”.

ChristianColors <- auto.shading(CensusBlocks_WashingtonDC@data$CountChristian, cols = brewer.pal(5, "Accent"))

Now that we have designated a shading scheme for the choropleth, we can use it to add a legend to our map. This legend gives the colors some meaning in the context of the graphic.

# par(mar=c(1,1,1,1))
# plot(WashingtonDC_shape, asp=1, col="white", bg="wheat", lty=1)
# choropleth(CensusBlocks_WashingtonDC, CensusBlocks_WashingtonDC@data$CountChristian, shading = ChristianColors, add=TRUE)
# choro.legend(px = -76.91984, py = 38.845, sh = ChristianColors, cex=0.7)
# title(main="Christian Houses of Worship by Census Block Group")